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Abstract 

Wc give a polynomial time algorithm for the lossy population recovery problem. In this 
problem, the goal is to approximately learn an unknown distribution on binary strings of length 
n from lossy samples: for some parameter fi each coordinate of the sample is preserved with 
probability /i and otherwise is replaced by a '?'. The running time and number of samples 
needed for our algorithm is polynomial in n and l/s for each fixed /i > 0. This improves on the 
algorithm of Wigderson and Yehudayoff [2{)] that runs in quasi-polynomial time for any fixed 
/Li > and the polynomial time algorithm of Dvir et al [!)] which was shown to work for fi ^ 0.30 
by Batman ct al [■',]. In fact, our algorithm also works in the more general framework of Batman 
et al. [3] in which there is no a priori bound on the size of the support of the distribution. The 
algorithm we analyze is implicit in previous work [!), 3]; our main contribution is to analyze the 
algorithm by showing (via linear programming duality and connections to complex analysis) 
that a certain matrix associated with the problem has a robust local inverse even though its 
condition number is exponentially small. A corollary of our result is the first polynomial time 
algorithm for learning DNFs in the restriction access model of Dvir ct al [0] and hence this 
model joins the random walk model of Bshouty et al [5] as the only examples of passive learning 
in which DNFs can be learned in polynomial time. 
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1 Introduction 



1.1 Background and Our Results 

The population recovery problem was introduced by Dvir et al [9] and also studied by Wigderson 
and YehudayofF [26]. To describe this basic statistical problem, we will borrow an example from 
[26]: 

Imagine that you are a paleontologist, who wishes to determine the population of di- 
nosaurs that roamed the Earth before the hypothesized meteor made them extinct. 
Typical observations of dinosaurs consist of finding a few teeth of one here, a tailbone 
of another there, perhaps with some more luck a skull and several vertebrae of a third, 
and rarely a near complete skeleton of a fourth.... Using these fragments, you are sup- 
posed to figure out the population of dinosaurs, namely a complete description of (say) 
the bone skeleton of each species and the fraction of each species occupied in the entire 
dinosaur population. 

To make this precise, suppose there is an unknown distribution vr over binary strings of length 
n. We are given samples from the following model: 

• Choose a string a according to vr 

• Replace each coordinate with a "?" independently with probability 1 — ;U 

The goal is to reconstruct the distribution up to an additive error e > 0. We would like to output 
a set of strings S and for each string a in S, an estimate vf (a) of it (a) with the requirement that 
each of these estimates is within e of 7r(a), and for every string a ^ S, 7r(a) must be at most e. 
This formulation of the problem is adapted from [3]; in the original version in [9] the support size 
of the distribution (which we will denote by k) is also a parameter. 

We remark that the maximum likelihood estimator can be computed efficiently using a convex 
program [2]. Yet the challenge is in showing that few samples are needed information theoretically. 
We will see another instance of this type of issue in our paper: our approach is based on the notion 
of a 'robust local inverse' (defined later) [9], which is easy to compute and the challenge is in 
showing that a good robust local inverse exists for any fixed /x > 0. 

Dvir et al [9] gave a polynomial time algorithm for lossy population recovery for any fi ^ 0.365; 
their analysis was improved by Batman, et al. [3] who showed that the same algorithm works for 
any /i > 1 — l/\/2 s» 0.30. Wigderson and YehudayofF [26] gave an alternate approach based on 
a method termed "partial identification" that runs in time quasi-polynomially in the support size 
k for any fixed /x > 0. Interestingly, Wigderson and YehudayofF [26] show that their framework 
cannot be used to get a polynomial time algorithm (and the number of samples needed is at least 
^loglogfc^^ We remark that their algorithm works even in the presence of corruptions, not just 
erasures (whereas ours does not). 

A generalization of the population recovery problem was introduced in the seminal work of 
Kearns et al [19], which they called the problem of learning mixtures of Hamming balls: Again, we 
choose a string a according to vr but now each bit in a is flipped with probability r]a < 1/2 and 
this probability is allowed to depend on a. Kearns et al [19] give algorithms for the special case in 
which each flip probability is the same (which is exactly the noisy population recovery problem) and 
their algorithms run in time exponential in the support size k. This is an interesting phenomenon 
in learning distributions, that for many problems we do not know how to achieve a running time 
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that is sub-exponential in the number of components. For example, this is the case when learning 
mixtures of product distributions [12], learning juntas [2 1], learning decision trees [KJ], and learning 
mixtures of Gaussians [23], [4]. In fact, many of these problems are inter-reducible [11], so with 
this context it is interesting that the population recovery problem is one positive example where 
we can avoid exponential dependence on k. But is there a polynomial time algorithm? 

Here we give an estimator that solves the population recovery problem, such that for any fixed 
/i > 0, the running time and number of samples needed is polynomial in n and 1/e. 

Theorem 1.1. There is an efficient algorithm for the population recovery problem whose running 
time and number of samples needed is 0{{n/e)'^^^^^) where f{fi) = l//ilog2//i -|- 0(1). 

The population recovery problem arose naturally from the investigation of one of the central 
problems in learning theory, learning DNFs. The best known algorithm for learning DNFs in 

1/3 

Valiant's PAC learning model [29] runs in time (roug hly) 2"" [20]. There are much more efficient 
algorithms that work in active settings - e.g. Jackson's celebrated algorithm that assumes the 
learner is allowed to make value queries to the function [17]. The only efficient algorithm for 
learning DNFs in a passive setting is the algorithm of Bshouty [5] where the learner is given 
samples generated by a random walk on the hypercube. 

Dvir et al [9] introduced a new model called restriction access that can be thought of as an 
interpolation between black box and white box access to the function: Each example consists of a 
restriction of the unknown DNF obtained by fixing a random 1 — ^ fraction of the input variables.. 
Dvir et al [9] showed how to reduce the problem of learning an n term, fc-variable DNF to solving 
an instance of the population recovery problem on strings of length n and support size k. Previous 
algorithms for population recovery yield a polynomial time algorithm for learning DNFs in the 
restriction access model for any // > 1 — l/\/2 [9, 3], and Wigderson and Yehudayoff [26] obtain 
a quasi-polynomial time algorithm that runs in time k^°s^ (where k is the number of clauses). 
Combining the reduction of [9] with Theorem 1.1 immediately gives: 

Theorem 1.2. There is an efficient algorithm to PAC learn DNFs for any /i > 0. More precisely, 
the running time and number of samples needed is 0{{n/e)^^^^^ poly {n,k)) where f{^) = l//ulog2//i 
and the algorithm succeeds with high probability. 

Hence the restriction access model joins the random walk model of Bshouty et al [5] as the only 
two passive models where DNFs can be learned in polynomial time . 

1.2 The Robust Local Inverse 

At a more philosophical level, what makes the population recovery problem particularly interesting 
is that in order to give an efficient estimator we need to solve a certain inverse problem despite the 
fact that the corresponding matrix has many exponentially small eigenvalues. As will be reviewed 
in the next section, Dvir et al [9] showed that the population recovery problem can be reduced to 
a problem of the following form: We have two unknown probability distributions vr and (j) over the 
domain {0, . . . , n}, which when viewed as vectors indexed by {0, ... , n} are related by the equation: 

(f^ = tt'^A 

where A is a known (row) stochastic invertible matrix indexed by {0, ... , n}. We want to estimate 
7r(0) but we only have access to samples chosen according to the distribution (p. We would like the 
running time and number of samples needed to be (at most) a polynomial in n and 1/e. 
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Let u denote the first column oi A ^. Then: 

7r(0) = 

So, if we knew the vector (j) exactly, we could use it to recover 7r(0) exactly. But we do not know 
(j). We can estimate (j) from random random samples in the obvious way: let be the fraction 
of observed samples having exactly j zero entries. We might then hope that vr(0) = (f^u is a good 
estimate to 7r(0). We will refer to this as the natural estimator for '/r(O). The error \{(p — (p)'^u\ of 
this estimator is at most ?^||</' — </'||oo||^i||oo- Thus to obtain estimation error e, it is enough that 



and the Chernoff-Hoeffding bound says that Clog{n){\\u\\oon/e)'^ samples are enough so that the 
probability of exceeding the desired error is less than . 

To ensure that this is not too many samples, we want that ||u||oo is polynomially bounded. 
For the estimation problem that is derived from population recovery, it turns out that ||tt||oo < 1 
provided that /i > 1/2 and so in this case the natural estimator yields an accurate estimate from 
a polynomial number of samples. But if /i < 1/2, then ||m||oo is exponentially large in n and this 
estimator requires exponentially many samples to be at all accurate. 

What else can we do? The vector u has entries that are too large, so Dvir et al [9] suggested 
replacing u by another vector v whose entries are not too large and such that tt^Av is close to ir'^Au 
for all distributions vr. Remarkably, Dvir et al [9] managed to construct such a v which works for 
fj, ^ .365 (the analysis was subsequently improved to /x ^ .3 [■')]), which in turn yields a polynomial 
time algorithm for the population recovery problem even in cases when the natural estimator fails! 

Since tt'^Au = vr(0), it follows that what we really want is to find a vector v so that 

\\Av - eolloo < e 

where cq is the indicator vector for zero (i.e. its first entry is one and the rest are all zero). And 
furthermore we want 11 

^llcxD to be as small as possible. A vector satisfying the above condition is 
called a e -local inverse for ^ at eo, and we will refer to ||t'||oo as the sensitivity of v. If we can find 
a V whose sensitivity is at most a, then poly{n, l/e,a) samples suffice to get an estimate {(f^v) to 
7r(0) that is within an additive e. 

Geometrically, a local inverse is obtained by taking A~^e where e is a small perturbation of 
the vector cq, which is chosen so that A~^e has small norm even though A"^eQ does not. What 
controls the behavior of A~^e is the representation of e in the basis of singular vectors of A. In 
choosing e we want to remove from eg the components corresponding to tiny singular values, which 
will ensure that the sensitivity of v = A~^e is not too large. We are hoping that that the weight 
on these deleted components is small so that the result is a good local inverse. 

The problem of finding the e-local inverse of minimum sensitivity for a particular matrix A can 
be expressed directly as a linear program whose variables are the vector v and the sensitivity a: 

mino" (1) 

Av > eo — el 

—Av > — eo — el 

v + al > 

-v + al > 
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The solution v can be used in to estimate -7r(0) from where the number of samples depends on 
o", as above. Note that the matrix A depends on /i. Our main contribution is to prove that there 
is a good solution to the above linear program for any /i > 0. 

The approach in Dvir et al [9] and in Batman et al [;!] was to guess a solution to the above 
linear program and bound its sensitivity. Instead, we consider the dual (maximization) problem 
and prove an upper bound on its maximum. After some work, the dual problem becomes a problem 
of finding a polynomial p of degree n so as to maximize p(0) — e||p||i, where || • ||i denotes the sum 
of the absolute values of the coefficients, subject to the constraint that the translated polynomial 
q{x) = p{l + {x — 1)//Lt) has ||g||i = 1. Bounding this maximum from above is then reduced to a 
problem of showing that if p is a polynomial (indeed any holomorphic function) on the complex 
plane and there exists a disk of nontrivial diameter where \p{z)\ is much smaller than |p(0)| then 
the maximum of p{z) on the unit circle must be much larger than |p(0)|. This final result can 
be viewed as a kind of uncertainty principle and is proved using tools from complex analysis (the 
Hadamard 3-circle theorem, and the Mobius transform). 

2 Reductions for Population Recovery 

Here we describe (informally) the reduction of Dvir, et al. [9] from the population recovery problem 
to the problem of constructing a robust local inverse for a certain matrix A (whose entries depend 
on fj): Recall that if we choose a string a G {0, 1}" (according to tt), the observation is a (random) 
string in {0,1,?}" obtained from a by replacing each Oj with '?' independently with probability 



The first observation of Dvir et al [9] is that we may as well assume that we know all of the 
strings a whose probability vr(a) is at least Q.{e). Of course, in the population recovery both the 
strings and their probabilities are unknown, so how can we reduce from the case when everything is 
unknown to the case where at least the set of strings with large probability is known? Suppose we 
ignore all but the first n' coordinates; then we get an instance of the population recovery problem 
on length n' strings. In particular, the probability 7r(a') of a length n' string is the total probability 
of all length n strings a whose first n' coordinates are exactly a' . Now the rough idea is that we 
can incrementally solve the population recovery problem on longer and longer prefixes, each time 
we increase the length of the prefix by one we at most double the number of candidate strings. The 
crucial insight is that we can always prune the set of strings because we never need to keep a prefix 
whose total probability is less than e. 

The second observation of Dvir et al [9] is that if all the strings are known, then it suffices to 
estimate 7r(0) within an additive e. This type of reduction is standard: given a string a, we can 
take each observation and XOR it with a but keeping the symbol '?' unchanged. The samples we 
are given can be thought of as samples from an instance of population recovery where every string 
is mapped to its XOR with a, and so we can recover vr(a) by finding the probability of the all zero 
string in this new instance of the problem. 

The final simplification is: suppose we ignore the locations of the ones, zeros and question marks 
in the samples but only recover the number of ones. Then we can map the probability distribution 
TT to a length n + 1 vector where 7r(i) is the total probability of all strings with exactly i ones. What 
is the probability that we observe j ones (and the remaining symbols are zeros or question marks) 
given that the sample a had j ones? This quantity is exactly: 
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So if we only count the number of ones in each observation, we are given random samples from 
the distribution ir'^A. Hence, if our goal is to recover the probability vr(0) assigned to the all zero 
string, and we ignore where the zeros, ones and question marks occur in our samples, we are faced 
with a particular matrix A (whose entries depend on n) for which we would like to construct a 
robust local inverse. 

Definition 2.1. Let (T„(//,e) denote the minimum sensitivity of a e-local inverse (i.e. the optimum 
value of (2)). 

The following family of vectors will play a crucial role in our analysis: 

1, a, Q^, ...a"~^ 

Then it can be checked that setting a = is the natural estimator (i.e. = A~^eo) and the 

sensitivity of this estimator is exponentially large for fi < 1/2. We prove: 

Theorem 2.2. For all positive integers n and fi,e > we have fj„(/u,e) < (l/e)^^'-''^ where f{u) = 
ilog^. 

Theorem 1.1 follows since as discussed in Section 1.2, the number of samples we need to obtain 
the desired approximation with high probability when using the best local inverse is (T„(//, e)^poly(n, 1/e). 

3 A Transformed Linear Program 

As outlined earlier, the problem of finding an e-local inverse can be expressed as a linear program- 
ming problem whose objective is to minimize the sensitivity. We want to prove an upper bound on 
the value of the solution, and we will accomplish this by instead bounding the maximum objective 
function of the dual. 

However, before passing to the dual we will apply a crucial change of basis to the linear program. 
The reason we do this is so that the dual can then be interpreted as a certain maximization problem 
over degree n polynomials. We will choose n + 1 values ao, ai, . . . , a™ (as we'll see the particular 
values won't matter) and we will consider the estimators defined in the previous section. We 
will abuse notation and refer to this estimator as v^. Since this family forms a basis, we can write 
any local inverse v in the form v = X^"=o ^i'^^- Let V be the columns v^, . . . ,v"' and let B = AV. 
Then our new linear program is: 

miner (2) 
BX > eo - el 
-BX > -eo + el 
VX + al > 
-VX + al > 
cj > 

The final constraint is superfluous, but is helpful in formulating the dual linear program. 

The coefficient matrix V is a Vandermonde matrix (i.e. each column has the form for some 
a) with the entry in row i and column j given by V-^ = {ajY (with Vq = 1). In fact, it turns out 
that B is also a Vandermonde matrix whose j*^ column is exactly Vi_^_^f^^._iy. 

k<i ^ k<i ^ ^ ^ 
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Indeed, this simple form for B is precisely the reason we chose this basis transformation. 

The new linear program has n + 2 variables and 4(n + 1) constraints (consisting of four groups 
of n + 1 constraints each) so the dual will have 4(re + 1) variables consisting of four vectors, denoted 
by p~^,p~ ,q~ , each indexed by {0, ... , nj.The resulting dual program is: 

max pt - Po - e^pf + Pi (3) 

i 

{p^ -p"fB + {q- -q+)V = 
^{qt+q-) < 1 

i 

p^,p~,q^,q~ > 

We can now make some simplifying observations. If for any i, both pf and p~ are positive, we 
can decrease them each by their minimum without violating the constraints, and only increasing 
the objective function. So we may assume that at least one of them is zero. Similarly for qf and 
q^ . Then we can define p = p'^ — p~ and q = q~^ — q~ to simplify the dual linear program to: 

maxpo -ey^\pi\ (4) 

i 

p^B = q^V 

i 

Define the polynomials p{x) = Yl^=oPj^'' q{x) = Yli'j=Qlj^'' ■ The equality constraint gives 
n + 1 equations indexed from to n + 1 where j^^ constraint is that p(l + p,{aj — 1)) = q{j). Since 
p{l + fi{x — 1)) and q{x) agree on n + 1 values they must be the same polynomial. This leads to 
the following formulation: 

The optimal sensitivity cr„(/i, e) is equal to the maximum of p(0) — e||p||i over all degree 
n polynomials for which the translated polynomial q{x) = p(l + fi{x — 1)) satisfies 

Iklli < 1- 

Recall that ||p||i denotes the sum of the absolute values of the coefficients. 

So now our goal is to prove an upper bound on the maximum of this linear program. We can 
think of this as trying to show a type of uncertainty principle for the coefficients of a polynomial 
when applying an affine change of variables. There is a considerable amount of literature on 
establishing uncertainty principles for functions and their Fourier transforms (see e.g. [8]), but there 
seems to be no literature concerning other affine changes of variables (i.e. p(l + //(x — 1)) = q{x)). 
In fact, here we will establish such an uncertainty principle via the Hadamard three circle theorem 
in complex analysis. 

4 Sup Relaxations 

The quantities \\p\\i and ||g||i are unwieldy - e.g. given just the graph of the polynomial, what can 
we say about its coefficients? Here we will relax constraints on ||p||i by instead considering the 
maximum of the polynomial over certain domains. 

Definition 4.1. {Restricted sup-norm.) For a subset W of M, let ||(?||^p '= sup^^^r \q{x)\. 



6 



Recall that we used the notation \\q\\i to denote the sum of the absolute values of the coefficients 
of q. Then it is easy to see that: 

Claim 4.2. \\q\\i > ||g||iup 

Proof. For each x G [-1, 1], Y^'^^q \qi\\x\' < X]"=o I?*! = 1^ 

In the polynomial formulation of (T„(/i,e) replacing the objective function by p{0) — £\\p\\iup^^ 
only increases the value of the objective function. Similarly, replacing the constraint ||g||i < 1 
by ||g||iup'^^ < 1 can only increase the objective function. Since q{x) = p{l + fi{x — 1)) and the 
transformation x — > 1 + fj,{x — 1) maps the interval [—1,1] to the interval [1 — 2/^,1] we have 
Ikllsup'^^ = This leads to a relaxation of the polynomial formulation: 

The optimal sensitivity cJn(/i, e) is at most the maximum of p(0) — e||p||liup'^' over all 
de gree n polynomials for which ||p||sup 

For this relaxation to be useful to us we will need to prove that the new objective function 
can not be too large if p satisfies the constraints of the relaxation. Informally, we will say that a 
polynomial is bad if it satisfies the constraints of the relaxation and makes the objective function 
very large. If fi > 1/2 then € [1 — 2p, 1] and so |p(0)| < 1 and no polynomial can be bad. So 
assume /i < 1/2. A bad polynomial must be bounded between —1 and 1 on the interval [1 — 2/i, 1], 
must be very large at the origin, and must have \p{x)\ at most |p(0)|/e for all x G [—1, 1]. 

Is there any polynomial that satisfies these conditions? Unfortunately for this approach, there 
is. The polynomial (1 — x^)"/^ has its maximum on [—1, 1] at the origin, where it is 1, and its 
maximum on [1 — 2/i, 1] is at 1 — 2fi where its value is C = {2fi — /i^)"/^ which is exponentially 
small. Thus the polynomial p{x) = ^(1 — x^)"/^ satisfies the constraints and has objective function 
value that is exponentially large in n. 

To salvage this approach we move to complex numbers. The definition of the restricted sup- 
norm extends directly to subsets W of the complex numbers. For /3 G C and positive real number 
7 let D:y{f3) be the closed disk in the complex plane of radius 7 centered at /3. Let C^{I3) be the 
circle bounding D^{l3). If /3 = we write simply and C^. As with 



Claim 4.3. ||g(x)||i > \\qix)\\ 



sup • 



Observe that the image of the disk Di under the transformation x — t- (1-|-/x(x — 1)) is Z)^j(l — /u). 
Just as before we obtain the following relaxation: 



The optimal sensitivity (T„(/i, e) is at most the maximum of p(0) — e||p||^p over all degree 
Ti polynomials such that ||p||su'p 

As we will see, there are no bad polynomials for this relaxation. In hindsight, it is not surprising 
that the values of the polynomial p{x) over the whole complex disk reveals much more information 
than just the values on [—1,1]; in particular, we can recover the values of a polynomial from 
integrating around the circle, so a polynomial cannot stay too small on the boundary of the disk 

2 -\\n/2 i-u^i- „,„o f^,. II . I|[-l'l] 



I sup 



if it is large at the origin. In particular the polynomial (x^ — 1)"/^ that was bad for the 
relaxation is no longer bad because its maximum (on Di) is attained at x = i and is exponentially 
large. 

In the next section we will prove: 
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Lemma 4.4. Let h be a holomorphic function and suppose D = Dp{(3) is a disk contained in 
Di such that ||/i||^p < 1. Then there is a point x G Ci such that \h{x)\ > \h{0)\^~^'^, where 
d=(l-|/3|)/log(2/p). 

From this uncertainty principle, we can now prove Theorem 2.2. 

Proof of Theorem 2.2. We use the bound from the Di-sup relaxation. Let p be a polynomial 
satisfying the constraints and let s = |p(0)|. Then p satisfies the conditions of Lemma 4.4 with 
P = 1 — and p = fi- Therefore ||p||^p > where d is as in the lemma. From this we conclude 

that the objective function in the || • ||^^ relaxation is at most s — es'^ which is maximized when 
s = {l/{d + and this quantity is itself an upper bound on the objective function. We can 

therefore conclude that (T„(e,^) < {l/e)^^^ where the exponent is equal to - log-. 



5 Proof of Lemma 4.4 

Here we will prove the uncertainty principle stated in the previous section using tools from complex 
analysis. Perhaps one of the most useful theorems in understanding the rate of growth of holomor- 
phic functions in the complex plane is Hadamard's Three Circle Theorem (and the related Three 
Lines Theorem): 

Theorem 5.1. // // Let < a < b < c and let g{x) be holomorphic function on the Dc- Then 



log-llffllS'n < log ^ log I 



\9 



\Ca 



\Cc 



^ ..- .Isup — ^ llif llsup ~l~ log ^ log ll^llsup- 



In Lemma 4.4 we do not have three concentric circles but we can apply a Mobius transformation 
to put the problem in the right form. Let /3 be the center of the disk D in the lemma and consider 
the transformation (j){x) = (p/s^x) = i^*^ , where (•)* denotes complex conjugate. The following 
fact is well known and easy to check: 

Fact 5.2. For |/3| < 1, (pp is a holomorphic function which maps Di to itself. 

1. 4>{Ci) = Ci. 

2. G (t>{C\p\). 

3. 4>{C,/2)<^D = Dp{(3). 

The first claim is a standard fact about Mobius transformations. The second follows from 
0(-/3) = 0. For the third, 



P + x 



1 + l3*x 



/3 





x{i-m 




1 + l3*x 



< \x\ 



1 



1 



|x|(l + 



< 21x1 



Now consider the function g defined on Di by g{x) = f[(j){x)). From the three previous obser- 
vations we have: 



L <7(Ci) = MCi)andso \\g\\% = \\h\\%. 

2. h{{))^g{Cp) so \\g\\% > \h{0)\. 

c 

3. g{Cpi2) ^ ^(^) so llfi'llsup^l < ll^llSip — 1' by the hypothesis of the lemma. 
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Applying Theorem 5.1 with a = p/2, b = |/3| and c = 1 we get: 

log- 5 sup <log-log 5 sup +log — log 5f 
p (3 p 

which when combined with the three previous bounds gives: 



log-|MO)| <log^log||/i||fj, 
P P 



sup ' 



^1 > \h{0\\ 



from which we conclude: 

ll^llsup 

where t = log(|)/log ^ = 1 + log(l//3)/log(2^/,9) > 1 + (1 - /3)/log(2/p), which is the parameter 
d defined in the lemma. 

6 Open Question 

Is there a polynomial time algorithm for population recovery when attributes are not deleted, but 
are flipped (with probability t] < 1/2)? It seems that new ideas are needed to handle this case 
in part because if we try the same method of writing a linear program over a basis of estimators, 
then instead of two polynomials related by an affine change of variables, we get two polynomials 
p{x) and q{x) for which p{x) = i{x)'^q{4>{x)) where i{x) is a linear function and (p{x) is a Mobius 
transformation. However this damping term ^(x)" makes it much easier for q{x) to be bounded in 
the complex disk. 

Another interesting question is whether or not the dependence on p can be improved: Our 
algorithm runs in polynomial time for any fixed p > 0, however the exponent of this polynomial is 
O(^log^). In comparison, the algorithm of Wigderson and Yehudayoff runs in quasi-polynomial 

time but the exponent is ©(log ^). Is there a polynomial time algorithm whose exponent depends 
logarithmically on ^? 
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